Low density parity check codec

ABSTRACT

The present invention provides a low-complexity and multi-mode Low-density Parity-check (LDPC) codec, in which the decoding operations are divided into small tasks and a unified hardware is implemented so that the hardware resources can be reused in different modes. In addition, memory access is achieved via routing networks with fixed interconnections and memory address generators, the complexity of the hardware implementation is reduced accordingly. Further, the present invention provides an early termination function with which the iterative operations can be terminated early when a threshold is reached so that the power consumption can be thus reduced. The hardware resources for early termination shares a part of hardware resources with an encoder according to the present invention so that the complexity of the hardware implementation can also be reduced.

FIELD OF THE INVENTION

The present invention relates to a low density parity check (LDPC) codecand particularly to a low-complexity and multi-mode LDPC codec and amethod of the same.

BACKGROUND OF THE INVENTION

The LDPC (Low-Density Parity-Check) code is an error correction codeapplied in encoding/decoding of message transmissions in noisy channelsof communication. An LDPC code C is defined by a sparse parity-checkmatrix (abbreviated as PCM thereafter) H. The “low-density” in LDPC isattributed to that the density of “1s (ones)” in the PCM H correspondingto an LDPC code is low. This characteristic of “low density” enables thelow complexity of decoding operations. The LDPC code has a superiorerror correction efficacy and is thus widely used in channel encodingtechniques in the next generation of communication systems, for example,WiMAX (Worldwide Interoperability for Microwave Access).

The WiMAX standard adopts QC-LDPC (quasi-cyclic LDPC) codes, which areblock-type error correction codes. When defining a QC-LDPC code C, itusually requires the definition of a PCM H corresponding to the QC-LDPCcode. H can be represented as an M×N (M by N) matrix, which can beexpanded from an M_(b)×N_(b) binary base matrix H_(b), wherein M=z×M_(b)and N=z×N_(b), and z is a positive integer called the expansion factor.In matrix H_(b), each “0 (zero)” can be replaced by a z×z zero matrix,and each “1 (one)” can be replaced by a z×z permutation matrix. The z×zpermutation matrix is obtained via cyclically right shifting an identitymatrix. FIG. 1 shows the presentation of a parity-check matrix H,wherein P_(ij) can be a z×z permutation matrix or a z×z zero matrix, andwherein i and j of matrix H_(b) are respectively the row index and thecolumn index.

As each permutation matrix is obtained via cyclically right shifting anidentity matrix, a binary base matrix and the shift indices of thepermutation matrices can be integrated to form a concise prototypematrix H_(bm). The prototype matrix H_(bm) has the same dimension as thebinary base matrix H_(b) does. Each “0 (zero)” of the binary base matrixH_(b) is replaced by a blank or a negative value, such as “−1”,representing a zero matrix completely formed of zeros, and each “1(one)” of the binary base matrix H_(b) is replaced by the displacementvalue of the cyclic right shifts. The prototype matrix H_(bm) can bedirectly expanded to obtain a parity-check matrix H. FIG. 2A and FIG. 2Brespectively show examples of a prototype matrix H_(bm) and aparity-check matrix H. In FIG. 2B, 0_(3×3) expresses a 3×3 zero matrix.

The WiMAX standard includes six types of code rates or classes: 1/2,2/3A, 2/3B, 3/4A, 3/4B and 5/6, and respectively provide one prototypematrix for each of the code class. Therefore, there are totally sixdifferent prototype matrices in the WiMAX standard. FIG. 3 shows theprototype matrix H_(bm) with the expansion factor z being 96 and thecode rate being 5/6. In the WiMAX standard, each class includes 19different QC-LDPC codes with different code lengths respectively. Thecode lengths are respectively determined by 19 different expansionfactors z, including 24, 28, 32, . . . , and 96. The code length can berepresented by 24z. Accordingly, the WiMAX totally has 114 (=6×19) typesof QC-LDPC codes.

In the WiMAX standard, each prototype matrix is assigned by the coderate. In other words, an LDPC code is assigned by parameters of the coderate and the code length. Therefore, it is an important subject tosimplify the hardware implementation via designing a flexible hardwarearchitecture, whereby most of the hardware resources can be re-used indifferent WiMAX modes.

An LDPC code is usually graphically expressed by a Tanner graph. TheTanner graph is a bipartite graph. FIG. 4A and FIG. 4B respectively showan LDPC code and the Tanner graph corresponding to the LDPC code. Eachrow of a PCM H is corresponding to a check node, and each column thereofis corresponding to a variable node. The PCM H in FIG. 4A has six rowsand nine columns. Thus, the corresponding Tanner graph has nine variablenodes in the top layer and six check nodes in the bottom layer, whereinthe numbers in the circles and squares respectively represent the columnindices and the row indices. When an element (i, j) in the PCM H is 1,there is an edge connecting the i^(th) check node and the j^(th)variable node.

The error correction efficiency of the LDPC code positively correlateswith the number of iterations. A noisier channel thus requires a greaternumber of iterations to improve the BER (Bit Error Rate) performance.For increasing the throughput of a partially-parallel decoder, it isnecessary to decrease the number of the processing cycles of a singleiteration or the number of iterations to achieve a given BER. Aconventional overlapped decoding technology can reduce the number of theprocessing cycles via scheduling the operations of the check nodes andvariable nodes to concurrently undertake some operations of them.Further, in a conventional TPMP (Two Phase Message Passing) decodingtechnology, the variable nodes can only use the C2V (check to variable)messages generated in the preceding iteration to undertake the updates.Thus, a greater number of iterations is required to obtain a given BERperformance.

SUMMARY OF THE INVENTION

One objective of the present invention is to provide a low-complexityand multi-mode LDPC codec and a method of the same, wherein the encodingand decoding operations are partitioned into layers, sub-layers andtasks, and wherein the quasi-cyclic structure of the LDPC codes enablesthe LDPC codes with different code rates and different code lengths toshare the hardware resources, whereby reducing the complexity inhardware implementation.

Another objective of the present invention is to provide an LDPC codecand a method of the same to reduce the number of iterations, wherein themessages generated in an identical iteration can be used to update othermessages, whereby a given BER performance can be achieved with onlyabout one half of the number of iterations.

A further objective of the present invention is to provide an LDPC codecand a method of the same, which provides an early termination function,wherein the early termination function is able to reduce the unnecessarydecoding operations and compatible with the layered decoding, thusreducing the power consumption of the system. Also, the earlytermination circuit in the codec architecture shares a part of thehardware resources with an encoder according to the present invention,whereby decreasing the complexity in hardware implementation.

According to one embodiment, the method for encoding and decoding anLDPC code of the present invention comprises the steps of: partitioninga PCM of an LDPC code into a plurality of layers in a row-permutationmanner; partitioning each of the layers into a plurality of sub-layersby rows; partitioning each of the sub-layers into a plurality of tasksin an out-of-order manner, wherein each of the layers comprises thetasks; processing the tasks of each layer in a sequential manner. Sincethe tasks are taken to be the processing units, the present inventioncan be applied to different classes and different code lengths of theWiMAX standard.

According to one embodiment, the LDPC codec of the present inventioncomprises a plurality of address generators; a first storage deviceaccessing the addresses supplied by the corresponding addressgenerators; a first routing network; a second routing network; aplurality of processing units connected to the corresponding firststorage device via the first routing network to perform an iterativedecoding operation, wherein each of the processing units produces aplurality of decoding operation output values fed back and stored in thefirst storage device via the second routing network.

The present invention further comprises an early termination circuit,which computes the operation value from the iterative decoding operationand early terminates the iterative decoding operation if the outputvalue of the calculation satisfies a given limit, whereby the number ofiterations is decreased and the power consumption is reduced. Thepresent invention further comprises an encoder sharing a part ofhardware resources with the early termination circuit, wherebysimplifying the hardware implementation.

BRIEF DESCRIPTION OF THE DRAWINGS

The embodiments of the present invention are described in cooperationwith the following drawings to make an easy understanding of theobjectives, characteristics and advantages of the present invention.

FIG. 1 is a diagram showing a quasi-cyclic parity-check matrix H;

FIG. 2A and FIG. 2B are diagrams respectively showing a prototype matrixH_(bm) and a parity-check matrix H;

FIG. 3 is a diagram showing a prototype matrix with the expansion factorz being 96 and the code rate being 5/6 in the WiMAX standard;

FIG. 4A and FIG. 4B are diagrams respectively showing an LDPC code andthe Tanner graph corresponding to the LDPC code;

FIG. 5A is a diagram showing three sub-matrices H₀′, H₁′, and H₂′obtained via rearranging the PCM shown in FIG. 2B with LMPD-ICM;

FIG. 5B is a diagram showing a core matrix H_(o) corresponding to thematrix H₀′ shown in FIG. 5A obtained by removing the zero columns fromH₀′;

FIG. 6 is a diagram showing the layers, sub-layers, tasks and processingsequences of different classes of the WiMAX LDPC codes;

FIG. 7 is a diagram showing a prototype matrix for the WiMAX LDPC codewith the expansion factor z being 24 and the code rate being 5/6, andthe five corresponding tasks determined by a computer search;

FIG. 8 is a block diagram schematically showing the hardwarearchitecture according to one embodiment of the present invention;

FIG. 9 is a block diagram schematically showing one embodiment that thehardware architecture of the present invention is applied to a case ofcode rate 5/6;

FIG. 10A is a diagram schematically showing the hardware architecture ofa variable node processor (VNP) of the present invention;

FIG. 10B is a timing diagram of five tasks in decoding of code rate 5/6in VNP;

FIG. 11A is a diagram schematically showing a multi-mode addressgenerator according to one embodiment of the present invention;

FIG. 11B is a diagram schematically showing a multi-mode VNU accordingto one embodiment of the present invention;

FIG. 11C is a diagram schematically showing a multi-mode row-sumcalculator according to one embodiment of the present invention; and

FIG. 12 shows the BER performance of decoding a LDPC code with a codelength of 2304 according to one embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The embodiments are used to exemplify the present invention. However,the persons skilled in the art should understand that the embodimentsare not intended to limit the scope of the present invention, and thatany equivalent modification or variation according to the spirit of thepresent invention is to be also included within the scope of the presentinvention. The preset invention is divided into several parts for theconvenience of description and demonstration. However, it neither meansthat the present invention should be practiced according to the divisionnor implies that the present invention should be realized via acombination of the parts. The present invention is exemplified by theapplications of WiMAX. However, the present invention does not limit itsapplications to WiMAX merely.

[LMPD-ICM and Tasks]

The present invention adopts an improved layered decodingtechnology—Layered message passing decoding (LMPD) using an identicalcore matrix (ICM) to achieve a fast convergence speed in BER performanceand realize the multi-mode functionality, such as 114 QC-LDPC codesadopted in the WiMAX standard.

According to the LMPD-ICM of the present invention, the row of the PCM Hof a QC-LDPC code C is partitioned into z layers in a row-permutationmanner. The l^(th) layer is denoted with H_(l)′ including the l^(th)row, the (z+l)^(th) row, . . . , and the ((M_(b)−1)z+1)^(th) row of thePCM H, wherein 0≦l<z, and M_(b) is the number of the rows of a binarybase matrix H_(b). The above-mentioned layering method is only anembodiment of the present invention. The present invention does notlimit layering to the above-mentioned method. In another embodiment, thel^(th) layer of H_(l)′ includes the ((l+a₀)mod(z))^(th) row, the(z+(l+a₁)mod(z))^(th) row, . . . , and the((M_(b)−1)z+(l+a_(Mb-1))mod(z))^(th) row of the PCM H, wherein 0≦l<z,and wherein a₀, a₁, . . . , and a_(Mb-1) are selected from non-negativeintegers, and wherein M_(b) is the number of the rows of a binary basematrix of the PCM. Further, the zero columns in the matrix H_(l)′ areremoved and the core matrix H_(l) of the l^(th) layer is obtained. Forexample, rearrange the PCM H shown in FIG. 2B and obtain the threematrices H₀′, H₁′, and H₂′ shown in FIG. 5A. The matrix H₀′ includes the0^(th), 3^(rd), 6^(th), 9^(th), and 12^(th) rows of the original PCM H,and H₁′ and H₂′ can be obtained in the same way. Remove the zero columnsin the matrix H₀′ and obtain the core sub-matrix H_(o) shown in FIG. 5B.The core matrices H₁ and H₂ can be obtained in the same way from H₁′ andH₂′, respectively.

The layering method of LMPD-ICM is to simplify the PCM of a QC-LDPC codeand reduce the hardware allocation for processing and storage. Theacquired H_(l) is a concise form derived from the code bits of a QC-LDPCcode. It should be noted that 0≦l<z and the core matrices H₁ are thecolumn-permuted versions to each other. Although the row permutation isundertaken in the LMPD-ICM, the simplification or rearrangement of thePCM H does not affect decoding since the order of the rows in the PCM Hhas nothing related to decoding.

For the WiMAX standard, the LDPC codes of an identical class have almostsame core matrices. Thus, the 19 LDPC codes of an identical class canshare identical routing networks. Thereby, the present invention cansimplify the routing networks, and all the layers can share theidentical routing networks. It is found from the core sub-matrix in FIG.5B that the dual-diagonal structure of the prototype matrix is preserved(appearing in the entire Part C and the last column of Part B). Such afeature benefits the application of layered encoding which will bedemonstrated later.

The implementation of the LDPC codec of the present invention is basedon the above-mentioned LMPD-ICM technology. Table 1 lists the parametersof the six classes of WiMAX LDPC codes. It shows that the layers ofdifferent classes respectively have different numbers of rows. In otherwords, the layer sizes are different in different classes. In order tore-use the hardware, the layer sizes of different classes of the WiMAXstandard should be unified. The present invention selects the smallestlayer size, i.e. size “4”, from all the classes as the standard size.For the class having a layer size greater than 4, the layer is furtherpartitioned into sub-layers according to the smallest layer size “4”.Thereby, the sizes of all the sub-layers of all the classes are no moregreater than 4. For the class of code rate 5/6, the layer size of thelayers is exactly 4, and the sub-layers are thus identical to thelayers.

FIG. 6 shows the layers, sub-layers, tasks and processing sequences ofdifferent classes of WiMAX LDPC codes. For the class with code rate of1/2, the prototype matrix H_(bm) thereof is a 12×24 matrix. Perform thelayering method of LMPD-ICM on the PCM H and delete the zero columns, acore matrix can be obtained. Further delete the elements of “0” in thecore matrix and obtain a layer sized about 12×7. For the class with coderate of 1/2, the row number of a core matrix thereof is 12, which isgreater than the standard size 4. Then, partition the layer intosub-layers according to the standard size 4, and each sub-layer has fourrows. Thus, a 12×7 layer matrix can be partitioned into three 4×7sub-layer matrices. In FIG. 6, the numbers in the sub-layers, tasks andprocessing sequences shown are their corresponding indices.

TABLE 1 Parameters for Different Code Classes Code Rate ½ ⅔A{B} ¾A{B} ⅚Row Weights {6, 7} A: {10} {14, 15} {20} B: {10, 11} Number of 12  8 6 4Rows for Each Layer Number of 3 2 2 1 Sub-layers for Each Layer Decoding4 4 3 4 Parallelism Number of 6 6 8 5 Cycles Required to Process a Layer

The row weight in Table 1 is the number of 1s corresponding to each rowof the parity-check matrix. The numbers inside the brackets representthe set of the row weights. From Table 1 and FIG. 6, it is found thatthe row weights of various classes of WiMAX LDPC codes are close to themultiples of 4. Thus, “4” is selected as the standard to furtherpartition the sub-layer into tasks. Thereby, the tasks of differentclasses have identical or similar sizes. Therefore, using tasks as theprocessing units enables all the classes of WiMAX LDPC codes torepeatedly use a unified hardware architecture. The complexity of thehardware architecture is accordingly reduced.

Refer to FIG. 6 for the explanation of dividing a layer into tasks. Inthe class with code rate of 1/2, a 4×7 sub-layer matrix is partitionedinto two 4×4 task matrices according to the pre-determined task size,wherein the un-used task entries are assigned by a dummy element whichdoes not affect the decoding operations. The other classes of WiMAX LDPCcodes can also be partitioned similarly to obtain the tasks shown inFIG. 6.

FIG. 6 also shows the processing task sequences of various code rates ofWiMAX. In the class of code rate 1/2, the processing task sequence ofthe l^(th) layer task is Task 11, Task 12, Task 21, Task 22, Task 31 andTask 32. Then, Task 11 of the (l+1)^(th) layer succeeds to the layer,and so on. Therefore, a layer needs 6 cycles to undertake operations. Atotal of 6z cycles are needed to complete an iteration of the LDPCdecoding for the class with code rate of 1/2, wherein z is the expansionfactor.

The hardware configuration should be taken into consideration inpartitioning a sub-layer into tasks. Refer to FIG. 7. The prototypematrix H_(bm) of code rate 5/6 and z=24 has 24 block columns. A memoryblock is used to store data of the corresponding block column.Therefore, there are total 24 memory blocks.

In FIG. 7, the 80 non-negative elements of the prototype matrix aredivided into 5 groups (matrices), and each group (matrix) includes 16non-negative elements. The reading of 16 pieces of values (more exactlythe APP values which will be described in details later) of a taskmatrix should be completed within one cycle. However, one memory blockcan only provide one value within each cycle. Thus, the accesses of the16 pieces of values are distributed to 24 memory blocks, which causesout-of-order processing shown by the arrows in FIG. 7. Generally, thearrangement of the tasks is undertaken by a computer. FIG. 7 also showsthe five tasks for code rate 5/6 and z=24—TASK 1, TASK 2, TASK 3, TASK 4and TASK 5.

[Expansion Factors]

In addition to the code rate, the expansion factor z is also a parameterin specifying an LDPC code in the WiMAX standard. There are total 19expansion factors, including 24, 28, 32, . . . , and 96. Both the PCM'sshown in FIG. 3 and FIG. 7 are LDPC codes with code rate 5/6 in theWiMAX standard. However, for the PCM shown in FIG. 7, the expansionfactor z is 24, and the expansion factor z for the PCM shown in FIG. 3is 96. In the WiMAX standard, the LDPC codes in the same code class havean identical code rate but different code lengths, the non-negativeelements (i, j) of the prototype matrix can be calculated with Equation(1):

$\begin{matrix}{{s\left( {i,j,z} \right)} = \left\lfloor \frac{{s\left( {i,j,96} \right)}z}{96} \right\rfloor} & (1)\end{matrix}$

wherein s(i, j, 96) is the non-negative element at the i^(th) row andthe j^(th) column of the prototype matrix of the LDPC code with coderate 5/6 when z=96, wherein z can be 24, 28, 32, . . . , or 96. Equation(1) can transform the matrix in FIG. 3 into the matrix in FIG. 7. Forthe LDPC codes with the other code rates, the prototype matrices havingdifferent code lengths also can be transformed in a similar way.

So far the transformation of the LDPC codes having different codelengths is described, the layering of a PCM H is achieved, and eachlayer is further divided into several tasks. Since the size of the tasksis fixed among different code rates, the decoding operations can berealized by sequentially executing a task sequence. A unified taskprocessor can be designed to provide multi-mode decoding.

[Decoding Algorithm]

In the present invention, LMPD-OMSA (Layered Message Passing Decodingusing Offset Min-Sum Algorithm) is adopted in the check node operations.However, the present invention does not limit the decoding to only usingthe above-mentioned algorithm. The spirit and principle of the presentinvention may also be realized with other algorithms.

Let R_(ij) ^((k)) express the check-to-variable (C2V for short) messagefrom the i^(th) check node to the j^(th) variable node, which isgenerated in the k^(th) iteration. Similarly, let Q_(ji) ^((k)) expressthe variable-to-check (V2C for short) message from the j^(th) variablenode to the i^(th) check node, which is generated in the k^(th)iteration. In LMPD, the rows (equivalent to the check nodes) of a PCM Hare divided into L groups (layers), and each group includes M_(L) rows.Thus, L×M_(L)=z×M_(b). Firstly, initialize C2V's R_(ij) ⁽⁰⁾ to be zerofor all i's and all j's that belong to I_(R)(i) (jεI_(R)(i)), whereinI_(R)(i) is an index set of the variable nodes connected with the checknode i. Next, sequentially undertake the check-node operations and thevariable-node operations for the 0^(th), 1^(st), 2^(nd), . . . , and(L−1)^(th) layers to complete an iteration. Thereby, the LMPD canutilize the C2V messages generated by the earlier layers in the k^(th)iteration to update the V2C message Q_(ji) ^((k)) calculated in a laterlayer in the same iteration. Therefore, comparing with TPMP algorithm,LMPD can achieve a given BER performance with only about a half numberof iterations. More details will be described later.

In the k^(th) iteration of LMPD, the check-node operations and thevariable-node operations of the l^(th) layer are described below.

1. The variable-node operations of the l^(th) layer:

In the l^(th) layer, the check nodes i are included, wherel×M_(L)≦i<(l+1)×M_(L). Each V2C message Q_(ji) ^((k)) related to theseM_(L) check nodes is computed from Equation (2):

$\begin{matrix}{Q_{ji}^{(k)} = {\lambda_{j} + {\sum\limits_{\underset{i^{\prime} < { \cdot M_{L}}}{i^{\prime} \in {I_{C}{(j)}}}}^{\;}R_{i^{\prime}j}^{(k)}} + {\sum\limits_{\underset{i^{\prime} \geq { \cdot M_{L}}}{i^{\prime} \in {{I_{C}{(j)}} \smallsetminus {\{ i\}}}}}^{\;}R_{i^{\prime}j}^{({k - 1})}}}} & (2)\end{matrix}$

wherein λ_(j) is the reliability value of the variable node j, andI_(C)(j) is the index set of the check nodes connected to the variablenode j.2. The check-node operations of the l^(th) layer:

In the l^(th) layer, each V2C message R_(ij) ^((k)) is computed fromEquation (3):

$\begin{matrix}{R_{ij}^{(k)} = {{S_{ij}^{(k)} \cdot \max}\left\{ {{{\min\limits_{j^{\prime} \in {{I_{R}{(i)}} \smallsetminus {\{ i\}}}}{Q_{j^{\prime}i}^{(k)}}} - \beta},0} \right\}}} & (3)\end{matrix}$

wherein β is a non-negative constant value, and

$\begin{matrix}{S_{ij}^{(k)} = {\prod\limits_{j^{\prime} \in {{I_{R}{(i)}} \smallsetminus {\{ j\}}}}^{\;}\; {{sgn}\left( Q_{j^{\prime}i}^{(k)} \right)}}} & (4)\end{matrix}$

wherein sgn is a sign function.

At the end of the k^(th) iteration, Λ_(j) ^((k))—the APP (A PosterioriProbability) Λ_(j) ^((k)) of the j^(th) bit (variable node) is computedfrom Equation (5):

$\begin{matrix}{\Lambda_{j}^{(k)} = {\lambda_{j} + {\sum\limits_{i^{\prime} \in {I_{C}{(j)}}}^{\;}R_{i^{\prime}j}^{(k)}}}} & (5)\end{matrix}$

Combining Equations (5) and (2), we can obtain Equation (6):

$\begin{matrix}{Q_{ji}^{(k)} = {\Lambda_{j}^{({k - 1})} - R_{ij}^{({k - 1})} + {\sum\limits_{\underset{i^{\prime} < { \cdot M_{L}}}{i^{\prime} \in {I_{C}{(j)}}}}^{\;}\left\lbrack {R_{i^{\prime}j}^{(k)} - R_{i^{\prime}j}^{({k - 1})}} \right\rbrack}}} & (6)\end{matrix}$

Let

$\Lambda_{j,{l - 1}}^{(k)} = {\Lambda_{j}^{({k - 1})} + {\sum\limits_{\underset{i^{\prime} < {l \cdot M_{L}}}{i^{\prime} \in {I_{C}{(j)}}}}\left\lbrack {R_{i^{\prime}j}^{(k)} - R_{i^{\prime}j}^{({k - 1})}} \right\rbrack}}$

Thus, Equation (6) can be simplified into Equation (7):

Q _(ji) ^((k))=Λ_(j,l-1) ^((k)) −R _(ij) ^((l-1))  (7)

The hardware implementation for LMPD can be constructed according to theabove discussion. The variable-node operations and check-node operationsof the l^(th) layer can be performed respectively according to Equation(7) and Equations (3) and (4). For the k^(th) iteration of the l^(th)layer, the Λ_(j,l) ^((k)) value can be computed from Equation (8):

$\begin{matrix}{\Lambda_{j,}^{(k)} = {\Lambda_{j,{ - 1}}^{(k)} + {\sum\limits_{\underset{{ \cdot M_{L}} \leq i^{\prime} < {{({ + 1})} \cdot M_{L}}}{i^{\prime} \in {I_{C}{(j)}}}}^{\;}\left\lbrack {R_{i^{\prime}j}^{(k)} - R_{i^{\prime}j}^{({k - 1})}} \right\rbrack}}} & (8)\end{matrix}$

In the l^(th) layer, if a specified variable node j is only connectedwith a check node i, Equation (8) can be simplified into Equation (9):

Λ_(j,l) ^((k)) =Q _(ji) ^((k)) +R _(ij) ^((k))  (9)

Similarly, if a specified variable node j is only connected with twocheck nodes i₁ and i₂ in the l^(th) layer, Equation (8) can besimplified into Equation (10):

Λ_(j,l) ^((k))=Λ_(j,l-1) ^((k)) −R _(i) ₁ _(j) ^((k-1)) −R _(i) ₂ _(j)^((k-1)) +R _(i) ₁ _(j) ^((k)) +R _(i) ₂ _(j) ^((k))  (10)

[Overview of Hardware Architecture]

FIG. 8 is a block diagram schematically showing hardware architectureaccording to one embodiment of the present invention. FIG. 8 is notintended to limit the scope of the present invention but only used todemonstrate one embodiment of the present invention. FIG. 8 does notpresent the exact numbers of the functional blocks but only demonstratesthe hardware functions schematically.

As shown in FIG. 8, the codec architecture of the present inventioncomprises an address ROM (read-only memory) 1. The address ROM 1 storesinformation about task arrangement for APP memory access. Initially, theinformation is loaded from the address ROM 1 to the address generators2. The address generators 2 automatically generate required addresses toread/write the associated APP values from/to the APP memory bank 3. TheAPP memory bank 3 is a first storage device comprising a plurality ofmemory blocks. The address ROM 1 is a second storage device used tostore initial addresses needed by the APP memory blocks of the APPmemory bank 3, the first storage device. Based on the quasi-cyclicstructure and the LMPD-ICM, the same class of LDPC codes shares the samerouting networks. For example, as shown in FIG. 8, a first routingnetwork 4 is connected with a corresponding VNU (Variable Node Unit) 5.In the VNU 5, the Λ_(j,l-1) ^((k)) value read from the APP memory bank 3and R_(ij) ^((k-1)) generated in the preceding iteration are used tocompute Q_(ji) ^((k)) according to Equation (7). A CNU (Check Node Unit)6 uses the worked out Q_(ji) ^((k)) to compute the required R_(ij)^((k)) according to Equations (3) and (4) and store the R_(ij) ^((k))into the R memory bank 7. The details will be described later. The VNU 5also works out the Λ_(j,l) ^((k)) value according to Equation (8) andwrites the Λ_(j,l) ^((k)) value into the corresponding memory block inthe APP memory bank 3 via a second routing network 10. In FIG. 8, thecombination of the VNU 5 and the CNU 6, which are encircled by a dottedrectangle line, may be regarded as a processing unit 12.

The codec architecture of the present invention also comprises an earlytermination circuit 8, and a termination threshold is input to the earlytermination circuit 8. For example, as shown in FIG. 8, an expansionfactor z is functioned as the termination threshold. When thetermination condition is satisfied, a controller 11 is used to earlyterminate the decoding operations. The early termination circuit 8 andan encoder 9 of the invention jointly share a portion of the hardwareresources. Thereby, the complexity of hardware implementation isdecreased, and the power consumption is reduced. The details will bedescribed later.

[Address Generator and APP Memory]

As shown in FIG. 6, in rate 5/6, the five tasks Task 1, Task 2, Task 3,Task 4, and Task 5 are processed in sequence when the l^(th) layer isoperated, and then Task 1, Task 2, . . . , of the (l+1)^(th) layer areprocessed, and so on. Based on the quasi-cyclic characteristic of thePCM, Task 1 of the l^(th) layer and Task 1 of the (l+1)^(th) layer comefrom different addresses of the same memory block of the APP memory bank3. Refer to FIG. 7 for an example. In the 0^(th) row of Task 1 of the0^(th) layer, the APP value in the 6^(th) address of the 1St block(denoted by 6(1)) of the APP memory bank 3 has to be accessed andassigned to the Input 3 of VNU₀. For the 1^(st) layer, the Input 3 ofVNU₀ is assigned by the APP value positioned at the address shifted fromthe above-mentioned address by one unit in the same memory block of theAPP memory bank 3. In other words, the 7^(th) address of the 1^(st)block of the APP memory bank 3 has to be accessed. The above-mentionedprinciple is also applied to other similar cases.

Summarize the above discussion as follows. Suppose the initial addressof an APP value associated with the 0^(th) layer is s(j), which meansthe address s of the j^(th) memory block. Based on the quasi-cycliccharacteristic of the LDPC codes in the WiMAX standard, the address ofthe APP value associated with the l^(th) layer corresponding to the sametask entry can be expressed in a general form of (s(j)+l)mod(z), whereinmod means a module operation. It has to be noted that when the expansionfactor z changes, only the initial address varies with the change of theexpansion factor, and the index j of the memory block remains unchanged.

FIG. 9 is a block diagram schematically showing one embodiment that thepresent invention is applied to a case of code rate 5/6. The addressgenerators 2 and the APP memory bank 3 are respectively shown in theleft and middle of the upper region of FIG. 9. The APP memory bank 3,which consists of 24 APP memory blocks, is a storage device. Refer toFIG. 7 again. The prototype matrix of code rate 5/6 has 24 columns, andeach column needs a memory block to store the corresponding data.Therefore, the APP memory bank 3 in FIG. 9 includes 24 memory blocks,which are designated as APP Mem_(i), wherein 0≦i<24. Each memory blockof the APP memory bank 3 is used to store the APP values of acorresponding block column of PCM. The read/write access for the 24memory blocks of the APP memory bank 3 is controlled by the 24 addressgenerators 2, which are designated as Address Generator_(i), wherein0≦i<24. The 16 APP values Λ_(j,l-1) ^((k)) associated with a task can beread from the APP memory bank 3 via the corresponding address generators2 and the first routing network 4. The VNU 5 and CNU 6 process the 16APP values Λ_(j,l-1) ^((k)), update Λ_(j,l-1) ^((k)) to be Λ_(j,l)^((k)) and write Λ_(j,l) ^((k)) into the APP memory bank 3 via thesecond routing network 10.

Refer to FIG. 6 again. In the case of code rate 5/6, processing a layerneeds 5 cycles, and an iteration of code rate 5/6 needs 5z cycles,wherein z is the expansion factor. Therefore, each address generator 2in FIG. 9 includes a 5-stage shift register 21.

According to the above discussion and based on the quasi-cycliccharacteristic of the LDPC code in the WiMAX standard, the addresses forthe l^(th) layer can be deduced from the address s(j) of the 0^(th)layer and expressed as (s(j)+l)mod(z), wherein mod means a modulefunction. Refer to FIG. 9 for an example. A 5-stage shift register 21outputs an address x to the corresponding APP memory block of the APPmemory bank 3. At the same time, a calculation unit 22 performs acalculation of (x+1)mod(z) to obtain the address for the next (theoriginal ordinal number plus 1) layer, and feeds back the new address tothe input terminals of the 5-stage shift register 21.

Since the time difference of reading and writing of every task ismaintained the same, the addresses for writing data to the APP memorybank 3 can be worked out from the information stored in some stages ofthe 5-stage shift register 21. In the modes of rate 5/6, the timedifference is 12 clock cycles, which is two times greater than thecycles of processing a layer (5 registers). Therefore, a functional unitof (x′−3)mod(z) is used to compute a correct writing address. In FIG. 9,each address generator 2 includes a writing address unit 23. The latencyof each writing address is 12. Therefore, an address x′ is processed toobtain (x′−3)mod(z), which is used as the writing address for the APPmemory bank 3. Each address generator 2 also has an index calculator 24,which computes the initial addresses of different expansion factors zaccording to Equation (1).

[CNU and VNU]

As shown in FIG. 6, a layer of code rate 5/6 has 4 rows, and eachsub-layer also has 4 rows. In other words, the layer is identical to thesub-layer in the class of code rate 5/6. The number of rows in onesub-layer is the decoding parallelism. Therefore, the decodingparallelism is 4 for the class of code rate 5/6. The decodingparallelisms for the other code rates are listed in Table 1. The rowweight of a PCM H of code rate 5/6 is 20. Therefore, each row is relatedto 20 V2C messages and 20 C2V messages. In each cycle, four V2C messagesrelated to an identical row are sent to an identical CNU 6 and processedthere. As the decoding parallelism is 4 in the class of code rate 5/6,four CNUs 6 are used to process four rows of each layer.

The embodiment that the present invention is applied to a case of coderate 5/6 is shown in FIG. 9. In this embodiment, there are four CNUs 6designated by CNU_(i), wherein 0≦i<4. In the four VNUs 5, the relatedAPP values Λ_(j,l-1) ^((k)) and C2V R_(ij) ^((k-1)) are used to compute16 V2C messages Q_(ji) ^((k)). The Q_(ji) ^((k)) values are respectivelysent to four CNUs 6. Each CNU 6 can be pipelined into four stages. Inthe first stage, the absolute value of four V2Cs messages |Q_(ji)^((k))| are sent to a four-input comparator unit 61 (CMP4 unit 61 forshort). The CMP4 unit 61 has six comparators (not shown in thedrawings). Each two of the four inputs are compared to find out the twominimums thereof. Thus C₂ ⁴=6 comparators are involved.

In the second stage, a simplified CMP4 unit 62 (S-CMP4 unit 62 forshort) that has only three comparators (not shown in the drawings) isused to find out the two minimums of |Q_(ji) ^((k))|, the two absolutevalues of the V2C messages, in an iterative manner. Two values of thefour inputs are derived from the feedback path from the output of theS-CMP4 unit 62 itself. As the four input values of the S-CMP4 unit 62have been partially ordered, the number of the comparators of the S-CMP4unit 62 is less than those used in the preceding stage.

In the third stage, a sign unit 631 is used to compute the sign of thenew C2V messages R_(ij) ^((k)) according to Equation (4). In this stage,the offset compensation of the two minimums generated by Equation (3) issimultaneously completed in two first subtractors 632. In this stage,the output is a compressed form of the 20 updated C2V messages R_(ij)^((k)), which is simultaneously sent to a corresponding compressed C2Vmemory bank 71 of an R memory bank 7 and stored there in a compressedformat. When these C2V messages are required in the following iteration,the compressed updated C2V messages R_(ij) ^((k)) are accessed from theC2V memory bank 71 and then decompressed by a decompressor 72. Thenumber of memory blocks in the R memory bank 7 is equal to the decodingparallelism which is designated by R Mem_(i), wherein 0≦i<4. In the laststage, the C2V calculator 64 simultaneously works out the 20 C2Vmessages R_(ij) ^((k)) which are sent to the VNU 5 for updating thecorresponding APP values.

Each CNU 6 iteratively compares the magnitudes of the 4 input V2Cmessages Q_(ji) ^((k)), and computes the 20 corresponding C2V messages.Then, the 20 updated C2V messages R_(ij) ^((k)) are simultaneously sentto the corresponding VNUs 5. As processing a layer needs five cycles, aniteration needs total 5z cycles.

Refer to FIG. 9 again. In the embodiment that the present invention isapplied to a case of code rate 5/6, there are four VNUs 5 designated byVNU_(i), wherein 0≦i<4. In the second subtractor 51 of each VNU 5, therelated APP values Λ_(j,l-1) ^((k)) and C2V R_(ij) ^((k-1)) are used tocompute 16 V2C messages Q_(ji) ^((k)) according to Equation (7). Asmentioned above, the V2C messages Q_(ji) ^((k)) are sent to the fourCNUs 6. On the other hand, the V2C messages Q_(ji) ^((k)) are also sentto the FIFO (First-In-First-Out) buffer 500 of every VNU 5 forperforming the required variable-node operation.

Refer to FIG. 10A a diagram schematically showing the hardwarearchitecture for a variable node processor (VNP). The VNP includes fourVNUs 5 and a parallel-update unit 510. The parallel-update unit 510 isused to overcome data hazard. As shown in FIG. 10A, each VNU 5 can bepipelined into ten stages, and the i^(th) stage is designated by STi.These stages STi are used to compute V2C messages Q_(ji) ^((k)),temporarily store the V2C messages when the magnitudes of these messagesare compared in the CNUs, and finally update these V2C messages to thelatest APP values Λ_(j,l) ^((k)) according to Equation (9) or (10).Refer to FIG. 7 again. The class of code rate 5/6 includes five taskmatrices. Thus, the five tasks are sequentially operated in VNP. FIG.10B is a timing diagram of the decoding operation of five tasks of coderate 5/6 in the VNP, wherein the i^(th) task is designated by TASKi, and1≦i<5.

Refer to FIG. 10A again. The APP values associated with a task are readfrom the APP memory bank 3 and sent to the ST1 of the VNUs 5 via thefirst routing network 4. In ST1, the first subtractor 51 subtracts theC2V messages R_(ij) ^((k-1)) calculated in the preceding iteration fromthe corresponding APP values Λ_(j,l-1) ^((k)) to obtain the V2C messagesQ_(ji) ^((k)) according to Equation (7).

ST2 to ST10 of VNU 5 are undertaken in the FIFO buffer 500. In ST2, theV2C messages Q_(ji) ^((k)) are sent to ST3 and the corresponding CNUs 6.When the CNUs 6 compute the C2V messages R_(ij) ^((k)), the registers(not shown in the drawings) of ST2-ST9 temporarily store the V2Cmessages. The details will be described later.

As the five tasks are executed in sequence, the five tasks of anidentical layer should be undertaken in five successive stages. As shownin FIG. 10B, when TASK1 is in ST9 (or, equivalently, when TASK 5 is inST5), the updated C2V messages R_(ij) ^((k)) have been available. Thus,the APP values related to the five tasks (or a layer) are updated inparallel according to Equation (9). In other words, the APP valuesrelated to TASK1, TASK 2, . . . , and TASK 5 are respectively updated inthe corresponding stages of the VNUs, e.g., ST9, ST8, . . . , and ST5.Also shown in FIG. 10B are the operations that a task executes indifferent timing. R represents “reading the APP value from the APPmemory bank 3”; B represents a buffer stage; Q represents “subtractingthe APP values from the C2V messages to obtain the V2C messages”; Wrepresents “writing the APP values back to the APP memory bank 3 via thesecond routing network; Π₁ represents the first routing network; FIFOrepresents “the FIFO buffer 500 in the VNUs 5”. In FIG. 10B, thegray-marked blocks represent that the APP values have been updated inSTi.

As mentioned above, the stages ST5-ST9 in the VNUs are utilized toupdate V2C messages to become the latest APP values. Thus, each ofST5-ST9 includes an adder 52 and a multiplexer 53. If the C2V messagesR_(ij) ^((k)) have not been available yet, the multiplexer 53 wouldchoose the V2C messages Q_(ji) ^((k)) from the preceding stage. When theCNUs 6 output the C2V messages R_(ij) ^((k)), the adder 52 adds the V2Cmessages Q_(ji) ^((k)) and the corresponding C2V messages R_(ij) ^((k))to obtain the updated APP values Λ_(j,l) ^((k)) according to Equation(9). At this time, the multiplexer 53 chooses the sum to the next stage.Finally, the updated APP values Λ_(j,l) ^((k)) are written back to theAPP memory bank 3 via the second routing network 10 task by task.

[Routing Network]

The first routing network 4 includes sixteen multiplexers respectivelyfunctioned as the sixteen inputs of the four VNUs. Since there are fivetasks in a layer of a rate-5/6 WiMAX LDPC code, each multiplexerconnects to five APP memory blocks of the APP memory bank 3 (not shownin the drawings). Thus, the size of each multiplexer is 5-to-1. Forexample, the elements of the first rows and the first columns of thefive tasks—TASK1, TASK 2, TASK 3, TASK 4 and TASK 5 in FIG. 7 arerespectively 0(21), 0(15), 1(14), 20(12) and 21(8). These addresses arerelated to the 0^(th) input of the 0^(th) VNU (VNU₀). Therefore, themultiplexers are connected to the 21^(st), 15^(th), 14^(th), 12^(th) and8^(th) blocks of the APP memory bank 3.

Similar to the first routing network 4, the second routing network 10includes twenty-four multiplexers. Each multiplexer (not shown in thedrawings) includes at most four inputs because each block of the APPmemory bank 3 can at most contribute four APP values to one layer. Itshould be noted that the column weight of one layer does not exceed 4.

The structure of the first and second routing networks 4 and 10 can beapplied to nineteen different codes of the same class in the WiMAXstandard because the interconnections between the VNUs and the APPmemory blocks remain unchanged when z changes. Please note that thenumerals on the buses in FIG. 9 represent the quantities of messages onthe buses.

[Early Termination and Encoder]

Refer to FIG. 9 again. After each VNU 5 updates the APP values, the APPvalues are respectively stored in the VNU5. Four row-sum calculators 81and a zero-check unit 82 are used to examine whether the signs of thelatest APP values of a layer satisfy the parity-check constraints. Whenthese APP values are updated, the five tasks TASK 1-TASK 5 arerespectively located in ST10-ST6 of the VNUs, and then the sign bits ofthe APP values are sent to the zero-check unit 82 to examine theparity-check constrains. When the output of the row-sum calculator 81 iszero, the layer is defined to be a valid layer. In FIG. 9, anaccumulator 83 is used to compute the successive number of the validlayers. A termination threshold is input to the accumulator 83, such asa number equal to the expansion factor z. If there are z successivevalid layers, it means that the output value of the row-sum calculator81 for each of the z successive valid layers is zero. In such a case,the decoding result is assumed to be correct. Thus, the decoder is earlyterminated, and such an action is called early termination. Owing to theinherent regularity of the z layers, all the layers can use the samerow-sum calculator 81. Further, the LDPC codes of the same class alsocan share the same early-termination hardware resource. Please note thatthe LDPC codes of the same class but of different expansion factors zcan use the same early-termination hardware resource via merelymodifying the termination thresholds, such as setting the successivevalid layers to be the expansion factors z. The early terminationfunction of the present invention can reduce the number of iterationsand is compatible with the above-mentioned layered decoding.

Refer to FIG. 2B again for the layered encoding of the presentinvention. The PCM H can be divided into Part A, Part B and Part C. Thecolumns of Part A are corresponding to the information bits. Part B andPart C are corresponding to the parity bits. As shown in FIG. 5B, thecolumns of a core matrix can be similarly divided into Part A, Part Band Part C according to the positions where the columns of the corematrix are located in the PCM H. In FIG. 5B, the dual-diagonal structureappearing in the entire Part C and the last column of Part B can be usedto effectively perform the encoding function.

Refer to FIG. 5B again. The sixteen bits v_(j) (j=0, 1, . . . , 15) thatare associated with a core matrix H₀ satisfy the five parity-checkequations assigned by H₀′. Suppose h_(ij) represents the element in thei^(th) row and the j^(th) column of the core matrix H₀. Thus,

$\begin{matrix}{{{\sum\limits_{j = 0}^{15}{v_{j}h_{ij}}} = {{0\mspace{14mu} {for}\mspace{14mu} i} = 0}},1,\ldots \mspace{14mu},4} & (11)\end{matrix}$

Sum up the five parity-check equations and obtain Equation (12):

$\begin{matrix}{{{\sum\limits_{i = 0}^{4}{\sum\limits_{j = 0}^{9}{v_{j}h_{ij}}}} + {\sum\limits_{i = 0}^{4}{v_{10}h_{i\; 10}}} + {\sum\limits_{i = 0}^{4}{\sum\limits_{j = 11}^{15}{v_{j}h_{ij}}}}} = 0} & (12)\end{matrix}$

Because of the dual-diagonal structure appearing in Part C and Part B inFIG. 5B, the third item in the left side of Equation (12) is zero, andthe second item in the left side is v₁₀. As v₁₀ is the variable to solvefor, Equation (12) is rearranged into Equation (13):

$\begin{matrix}{v_{10} = {\sum\limits_{i = 0}^{4}{\sum\limits_{j = 0}^{9}{v_{j}h_{ij}}}}} & (13)\end{matrix}$

Please note that the right side of Equation (13) is only involved withmassage bits. The parity bits v₁₀ can thus be obtained from Equation(13). In other words, the parity bits can be worked out with meremessage bits. Repeating the same procedures for other (z−1) layers (corematrices), all the parity bits of Part B corresponding to the PCM inFIG. 2B can be obtained. After the parity bit v₁₀ associated with eachlayer is encoded, the parity bit v₁₁ must be encoded by another layerwith Equation (13) when a layer is loaded to the encoder for the secondtime, because the 10^(th) bit v₁₀ and 11^(th) bit v₁₁ associated with acore matrix come from the same block column (Part B). The other fourparity bits in Part C of the core matrix H₀ can be obtained from thefollowing equations (14) and (15):

$\begin{matrix}{v_{12 + m} = {{\sum\limits_{i = {1 + m}}^{4}{\sum\limits_{j = 0}^{9}{v_{j}h_{ij}}}} + v_{10} + v_{11}}} & (14) \\{v_{14 + m} = {{\sum\limits_{i = {3 + m}}^{4}{\sum\limits_{j = 0}^{9}{v_{j}h_{ij}}}} + v_{11}}} & (15)\end{matrix}$

wherein m=0 or 1. Repeating the same procedures for other (z−1) layers,all the parity bits of Part C of the PCM can be obtained. Thereby, basedon the layered encoding, the parity bits can be worked out completelywith the message bits.

Refer to FIG. 9 again. The row-sum calculators 81 used in the earlytermination circuit 8 are also used in the encoding modes of the presentinvention. Via forcing the sign bits corresponding to the parity bits tobe zero, the outcome of the row-sum calculators 81 becomes Σ_(j=0) ⁹v_(j) h_(ij). The obtained values are sent into a parity-bit calculator91. The parity-bit calculator 91 computes the values to obtain theparity bits v₁₀, v₁₁, . . . , v₁₅ according to Equations (13), (14) and(15) and then stores the parity bits into the APP memory bank 3 via thesecond routing network 10.

In the present invention, the encoding operation shares the first andsecond routing networks 4 and 10, the address generators 2 and the APPmemory bank 3 with the decoding operation. Thus, the additional increasein complexity caused by including the encoding functions can be reduced.Further, the encoder 9 also shares a portion of hardware resources withthe early termination circuit 8. Therefore, the present invention canreduce the complexity of hardware implementation when both the encodingand decoding functions are included.

[Application of Multi-Rate]

In the present invention, a multi-mode address generator 20 is used toexpand the application of hardware to different classes (code rates) ofWiMAX LDPC codes, as shown in FIG. 11A. According to FIG. 6, the largestprocessing cycle of each layer is eight for any one of the six classes.Thus, there is an eight-stage shift register 201 appearing in FIG. 11A.As to the application of multi-rate, it should be taken intoconsideration that the latency between reading and writing a task isdifferent in different code rates. Thus, the writing address is obtainedby choosing different stages from the shift register and differentsubtrahend numbers. For code rate 5/6, as the pipeline latency isgreater than the double of the number of the processing cycle of onelayer. In FIG. 9, y in the equation of (x′-y)mod(z) of the writingaddress unit 23 has a value of 3. For other classes, as the latency issmaller than the double of the number of the processing cycle of onelayer, y in these classes is set to be 2. As shown in FIG. 11A, themulti-mode address generator 20 has three additional multiplexers 205.Thereby, the appropriate address is selected to be fed back to theleftmost stage of the shift register thorough a shifting unit 202according to different classes. Further, an appropriate value of y andthe appropriate output stage of the eight-stage shift register 201 areinput to the writing address unit 203. In FIG. 11A, there is also anaddress multiplexer 206. The address multiplexer 206 is able to selectan initial address worked out by an index calculator 204 for differentexpansion factors or select an address for the next layer from theshifting unit 202.

For CNU 6 and VNU 5, different classes of WiMAX LDPC codes havedifferent numbers of tasks in a sub-layer thereof. In code rate 1/2, theoperations of CNU 6 and VNU 5 repeat once every two cycles because eachsub-layer includes two tasks. FIG. 11B shows a multi-mode VNU 50,wherein only ST4-ST9 are presented. In the multi-mode VNU 50, the fivestages—ST5-ST9 are used to update the APP values. The decodingoperations of rate-5/6 codes make use of all these stages torespectively update the APP values for TASK1, TASK2, TASK3, TASK4 andTASK5 in ST9, ST8, . . . , ST5. In the code rate 1/2, a sub-layer ispartitioned into two tasks, and the APP values are updated in ST8 andST9. Thus, the output of ST4 is directly forwarded to the input of ST8.In other words, the ST5, ST6 and ST7 are bypassed to reduce the pipelinelatency. In FIG. 11B, there is a plurality of multiplexers 501 forselecting the stages (STi) to be bypassed in different code rates.

FIG. 11C shows a multi-mode row-sum calculator 90 according to oneembodiment of the present invention. In the case of code rate 5/6, whileTASK1 is at ST10, the multi-mode row-sum calculator 90 is driven toreceive 20 sign bits of the APP values from the last five stages of theVNU5. The multi-mode row-sum calculator 90 includes a plurality offirst-stage XOR units 901 and a second-stage XOR unit 902, which areused to compute the final XOR or calculation values of the sign of allAPP values of the row. Because the sub-layers of different classesrespectively have different numbers of tasks, some of the first-stageXOR units 901 can be ignored. As shown in FIG. 11C, the multi-moderow-sum calculator 90 also has a plurality of multiplexers 903 whichinput zero to disable ST6, ST7 or ST8 according to different codeclasses.

[Performance Analysis]

According to the hardware architecture proposed by one embodiment of thepresent invention, a multi-mode LDPC codec architecture with the earlytermination is fabricated with a 90 nm CMOS process including nine metallayers. The APP values and V2C messages are quantified into 7-bit data,and C2V messages are quantified into 5-bit data. FIG. 12 shows the BERperformance based on the WiMAX LDPC codes with a code length of 2304.The floating-point BER performance and the fixed-point BER performanceare simultaneously shown in FIG. 12 and are respectively presented bythe solid curve (FLO) and dashed curve (FIX), wherein A and Brespectively denote the types of code rate.

Liu et al. proposed “An LDPC decoder chip based on self-routing networkfor IEEE802.163 applications” in IEEE J. Solid-State Circuit, vol. 43,no. 3, pp. 684-694, March 2009. The prior art adopts a phase-overlappedMPD technology. In the prior art, when decoding the rate-1/2 length-2304WiMAX LDPC code, the BER can achieve 10⁻⁵ when the iteration numberN_(it)=20 and the Eb/No=2.2 dB, wherein the Eb/No is SNR value.According to one embodiment of the present invention, a decoder for thesame code can achieve the same BER when N_(it)=12 and Eb/No=2.15 dB. Incomparison with TPMP, the LMPD-ICM technology adopted by the presentinvention can achieve the same BER with the number of iteration beinggreatly reduced.

Further, the present invention includes an early termination circuit todecrease the required iteration number and operations to thus decreasepower consumption. Table 2 lists the power consumption under theenvironment with specified SNR values for obtaining BER of 10⁻⁵ at z=96.Table 2 shows that decoding of different classes utilizes differentregions of hardware and thus has different power consumption. Table 2also shows that when the early termination circuit is enabled, the powerconsumption is greatly reduced in all code classes.

TABLE 2 Code Rate 1/2 2/3A 2/3B 3/4A 3/4B 5/6 SNR (dB) 2.15 2.73 2.723.2 3.18 3.95 Power 108.6 111.7 112.6 96.3 96.7 123.8 without usingearly termination (mW) Power in 63.2 62.0 52.5 52.5 53.0 57.8 usingearly termination (mW) Percentage of 41.8% 44.5% 53.4% 45.5% 45.2% 53.3%saved power

The embodiments described above are only to exemplify the presentinvention but not to limit the scope of the present invention. Anyequivalent modification or variation according to the technical contentsor spirit of the present invention is to be also included within thescope of the present invention, which is based on claims stated below.

What is claimed is:
 1. A low-density parity-check codec, which performsencoding and decoding operations of a parity-check matrix of alow-density parity-check code, comprising: a plurality of addressgenerators; a first storage device comprising a plurality of accessaddresses supplied by the corresponding address generators; a firstrouting network; a second routing network; and a plurality of processingunits connected with the first storage device via the first routingnetwork and used to perform an iterative decoding operation, wherein aplurality of output values of the decoding operation of each processingunit is fed back to and stored in the first storage device via thesecond routing network.
 2. The low-density parity-check codec accordingto claim 1, wherein each of the processing units further comprises a VNU(Variable Node Unit) and a CNU (Check Node Unit); the VNUs are used toperform operation of a plurality of “variable-to-check (V2C)” messagesand a plurality of APP values; the CNUs are used to perform operation ofa plurality of “check-to-variable (C2V)” messages.
 3. The low-densityparity-check codec according to claim 2 further comprising an earlytermination circuit, wherein the signs of the APP values of the VNUs arecomputed to obtain values for the early termination circuit; and theiterative decoding operation is terminated if the values satisfy atermination condition.
 4. The low-density parity-check codec accordingto claim 3, wherein the early termination circuit further comprises aplurality of row-sum calculators corresponding to the VNUs, a zero-checkunit and an accumulator; the row-sum calculators are used to compute thevalues for the early termination circuit and send the values to thezero-check unit for comparing; the zero-check unit examines whether thenumber of successive zeros of the values is equal to a threshold; if thenumber of successive zeros of the values is equal to a threshold, theiterative decoding operation is terminated.
 5. The low-densityparity-check codec according to claim 4 further comprising an encoderused to perform a layer-encoding operation.
 6. The low-densityparity-check codec according to claim 5, wherein the encoder includesthe row-sum calculators and a parity-bit calculator; the encoder takesthe outputs of the row-sum calculators as the inputs of the parity-bitcalculator to perform the layer-encoding operation.
 7. The low-densityparity-check codec according to claim 2, wherein each of the VNUsfurther comprises a subtractor and a FIFO (First-In-First-Out) buffer;the FIFO buffer is a timing buffer area used to wait for the“check-to-variable (C2V)” messages computed by the CNUs corresponding tothe VNUs.
 8. The low-density parity-check codec according to claim 2,wherein each CNU further comprises a plurality of comparator units usedto find out a plurality of extreme values from a plurality of inputs. 9.The low-density parity-check codec according to claim 1 furthercomprising a second storage device used to store a plurality of initialaddresses required by the first storage device, wherein the addressgenerator receives the initial addresses as the inputs to compute theaccess addresses required by the first storage device.
 10. Thelow-density parity-check codec according to claim 1, wherein eachaddress generator comprises a shift register with a plurality of stagesand a shifting unit.
 11. The low-density parity-check codec according toclaim 6, wherein the address generators, the VNUs and the row-sumcalculators respectively comprise a plurality of multiplexers; themultiplexers take a plurality of code rates to be input values, wherebythe low-density parity-check codec is enabled to operate according toone of the code rates.
 12. The method for encoding and decoding alow-density parity-check code according to claim 1, wherein thelow-density parity-check code is a quasi-cyclic low-density parity-checkcode conforming to WiMAX standards.